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Objective: The authors sought to dem- 
onstrate that schizophrenia is a hetero- 
geneous group of heritable disorders 
caused by different genotypic networks 
that cause distinct clinical syndromes. 

Method: In a large genome-wide associa- 
tion study of cases with schizophrenia and 
controls, the authors first identified sets of 
interacting single-nucleotide polymorphisms 
(SNPs) that cluster within particular individu- 
als (SNP sets) regardless of clinical status. 
Second, they examined the risk of schizo- 
phrenia for each SNP set and tested replica- 
bility in two independent samples. Third, 
they identified genotypic networks com- 
posed of SNP sets sharing SNPs or subjects. 
Fourth, they identified sets of distinct clinical 
features that cluster in particular cases 
(phenotypic sets or clinical syndromes) with- 
out regard for their genetic background. 
Fifth, they tested whether SNP sets were 
associated with distinct phenotypic sets in 
a replicable manner across the three studies. 



Results: The authors identified 42 SNP 
sets associated with a 70% or greater 
risk of schizophrenia, and confirmed 34 
(81%) or more with similar high risk of 
schizophrenia in two independent sam- 
ples. Seventeen networks of SNP sets 
did not share any SNP or subject. These 
disjoint genotypic networks were as- 
sociated with distinct gene products 
and clinical syndromes (i.e., the schizo- 
phrenias) varying in symptoms and 
severity. Associations between geno- 
typic networks and clinical syndromes 
were complex, showing multifinality 
and equifinality. The interactive net- 
works explained the risk of schizophre- 
nia more than the average effects of all 
SNPs (24%). 

Conclusions: Schizophrenia is a group of 
heritable disorders caused by a moderate 
number of separate genotypic networks 
associated with several distinct clinical 
syndromes. 
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X^^omplex diseases, such as schizophrenia, may be 
influenced by hundreds or thousands of genetic variants 
that interact with one another in complex ways, and 
consequently display a multifaceted genetic architecture 
(1). The genetic architecture of heritable diseases refers to 
the number, frequency, and effect sizes of genetic risk 
alleles and the way they are organized into genotypic 
networks (2). In complex disorders, the same genotypic 
networks may lead to different clinical outcomes (a con- 
cept known as multifinality, which is called pleiotropy in 
genetics), and different genotypic networks may lead to 
the same clinical outcome (equifinality, which is also 
described as heterogeneity) (1, 3). In general, geneticists 
must expect the likelihood that many genes affect each 
trait and each gene affects many traits (4). Consequently, 
research on complex heritable disorders like schizophre- 
nia is likely to yield weak and inconsistent results unless 
the complexity of their genetic and phenotypic architec- 
ture is taken into account (5). 



For example, twin and family studies of schizophrenia 
consistently indicate that the variability in risk of disease is 
highly heritable (81%) (6, 7), but only 25% of the variability 
has been explained by specific genetic variants identified 
in genome-wide association studies (GWAS) (8). This is 
not surprising for complex disorders like schizophrenia 
because current GWAS methods have been unable to 
characterize the gene-gene interactions (Figure 1A) that 
influence the developing clinical profiles (Figure IB) in 
complex ways (10). The frequent failure to account for 
most of the heritability of complex disorders has been 
called the "missing" (11) or "hidden" (12) heritability 
problem. 

In past studies of schizophrenia, the missing heritability 
problem has been approached by analyzing the explained 
variance in large individual samples or by using meta- 
analysis to combine data sets (9, 13, 14). Efforts have also 
been made to consider the impact of variation related to 
ethnicity, sex, chromosomes, functional observations, or 
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FIGURE 1. Perception and Visualization of a Genome-Wide Association Study (GWAS) a 
A Genotype 
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a Panel A is a matrix corresponding to the genome-wide association data set utilized in this work: Genetic Association Information 
Network (GAIN) and non-GAIN schizophrenia samples of the Molecular Genetics of Schizophrenia study (9). Allele values are indicated 
as BB (dark blue), AB (intermediate blue), AA (light blue), and missing (black). Panel B is a matrix corresponding to the distinct 
phenotypic consequences using data at the symptom level from the Diagnostic Interview for Genetic Studies corresponding to the 
GWAS in panel A (see Appendix I, catalog of phenotypic features, and Figures S1 and S2 in the online data supplement). Values are 
indicated as present (garnet), absent (salmon), and missing (black). Panel C presents schematics of the "divide and conquer" approach, 
in which natural partitions of GWAS data (identified as sets of interacting single-nucleotide polymorphisms [SNPs] or SNP sets) were 
cross-matched with decomposed schizophrenia phenotype (identified as clusters of naturally occurring schizophrenia symptoms or 
phenotypic sets), revealing a specific and distributed genotypic-phenotypic architecture (networks of SNPs associated with sets of 
schizophrenia symptoms). This complex architecture is "invisible" to traditional GWAS. 
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allele frequency (8). Nevertheless, most of the heritability 
of schizophrenia remains unexplained (8). 

We have chosen to measure and characterize the com- 
plexity of both the genotypic and the phenotypic archi- 
tecture of schizophrenia (Figure 1C). Past studies have 
generally ignored variation in clinical features, categoriz- 
ing people as either having or not having schizophrenia, 
and they have looked only at the average effects of ge- 
netic variants, ignoring their organization into interactive 
genotypic networks. We postulate that schizophrenia 
heritability is not missing but is distributed into different 
networks of interacting genes that influence different 
people (15-17). Unlike previous studies that neglected 
clinical heterogeneity among subjects with schizophrenia 
(14, 18, 19), we characterized the clinical phenotype in 
detail. We also allowed for possible developmental com- 
plexity, including equifinality (or heterogeneity) and 
multifinality (or pleiotropy). 

We investigated the architecture of schizophrenia in the 
Molecular Genetics of Schizophrenia (MGS) study, in which 
all subjects had consistent and detailed genotypic and 
phenotypic assessments (9). We then replicated the results 
in two other independent samples in which comparable 
genotypic and phenotypic features were available: the 
Clinical Antipsychotic Trial of Intervention Effectiveness 
(CATIE) and the Portuguese Island studies from the 
Psychiatric Genomics Consortium (PGC) (19-23). 

Method 

We first identified sets of interacting single-nucleotide poly- 
morphisms (SNPs) that cluster within subgroups of individuals 
(SNP sets) regardless of clinical status in the MGS Consortium 
study, employing our generalized factorization method (24-27) 
combined with non- negative matrix factorization to identify 
candidates for functional clusters (17) (see Figures SI and S2 
in the data supplement that accompanies the online edition of 
this article). This approach performs an unsupervised co- 
clustering of subjects together with distinguishing genotypic/ 
phenotypic features based on the empirical data alone. We 
combined the Genetic Association Information Network (GAIN) 
and non-GAIN samples of the MGS study, which constitute one 
GWAS (9). The 4,196 cases and 3,827 controls in the MGS study 
were combined to identify SNP sets. We had data of good 
quality on 696,788 SNPs on these cases and controls, and from 
these we preselected 2,891 SNPs that had at least a loose 
association (p values <1.0xl0 -2 ) with a global phenotype of 
schizophrenia (see the data supplement). SNP sets were labeled 
by a pair of numbers based on the order in which they were 
chosen by the algorithm (see the data supplement). Each SNP 
set was composed of a particular group of subjects described by 
a particular set of homozygotic and/or heterozygotic alleles; 
subjects and/ or SNPs may be present in more than one set (17, 
24, 25). The SNP sets identified by our generalized factorization 
method are optimal clusters of SNPs in particular subjects 
that encode AND/ OR interactions between SNPs and subjects 
(Figure 2A-F, Table 1; see also Figure S3 and the Method section 
in the data supplement). These SNP sets and their relations with one 
another characterize the genetic architecture of schizophrenia- 
associated SNPs in all subjects, including cases and controls 
(Figure 1A). 



Second, we examined the risk of schizophrenia for each SNP 
set and identified those with high risk. The statistical significance 
of the association of SNP sets with schizophrenia was calculated 
using the SNP-Set Kernel Association Test (SKAT) program, 
which properly accounts for multiple comparisons (15-17). 

Third, we checked for significant overlap among SNP sets in 
terms of subjects and/or SNPs using hypergeometric statistics (24, 
25, 28) (see Figures SI and S2 in the online data supplement). This 
allowed us to characterize the relations among SNP sets and to 
identify SNP sets that were connected to each other by having 
certain SNPs or subjects in common, thereby composing geno- 
typic networks. Disjoint networks shared neither SNPs nor subjects, 
as expected if schizophrenia is a heterogeneous group of diseases. 

Fourth, we identified sets of distinct clinical features that 
cluster in particular cases with schizophrenia (i.e., phenotypic 
sets or clinical syndromes) without regard for their genetic 
background (29), again using non-negative matrix factorization 
(17). Ninety- three clinical features of schizophrenia from inter- 
views based on the Diagnostic Interview for Genetic Studies (30), 
as well as the Best Estimate Diagnosis Code Sheet submitted by 
GAIN/ non-GAIN to dbGaP, were initially considered with the 
MGS sample (see references 31, 32; see also Appendix I in the 
online data supplement). The Diagnostic Interview for Genetic 
Studies was utilized for the Portuguese Island samples. Corre- 
sponding features were extracted in CATIE from the Positive and 
Negative Syndrome Scale, the Quality of Life Questionnaire, and 
the Structured Clinical Interview for DSM-IV (23). These pheno- 
typic sets and their relations with one another characterize the 
phenotypic architecture of schizophrenia (Figure IB). 

Fifth, we tested whether SNP sets were associated with distinct 
phenotypic sets in the MGS sample, and we tested the replicability 
of these relations in the two other independent studies. Replica- 
tion was evaluated in terms of replication of the SNP sets and their 
corresponding risk, as well as the relationships between SNP sets 
and phenotypic sets. In the samples that used the Diagnostic 
Interview for Genetic Studies (the MGS and Portuguese Island 
samples), the specific phenotypic features can be compared. Since 
the CATIE study did not use the Diagnostic Interview for Genetic 
Studies, we estimated the corresponding symptoms from available 
phenotypic data (based on the Positive and Negative Syndrome 
Scale, the Quality of Life Questionnaire, and the Structured Clinical 
Interview for DSM-IV). Genotypic and phenotypic data were 
available for 738 cases in CATIE and 346 cases in the Portuguese 
Island study (see the online data supplement). The significance of 
cohesive relations among SNP sets and clinical syndromes was 
tested using hypergeometric statistics (17, 24, 25, 28). The relations 
between the genotypic and phenotypic clusters characterize the 
genotypic -phenotypic architecture (Figure 1C). 

Methodological details and references for the "divide and 
conquer" algorithm that we developed and used are available in 
the online data supplement (24-27). Our web server application 
PGMRA (17), for identifying genotype-phenotype relations in GWAS, 
is online at http://phop.ugr.es/fenogeno. Statistical analysis was 
performed by SKAT (15, 16), also accessible through PGMRA. 

Results 

Identifying Many SNP Sets as Candidates for 
Schizophrenia Risk 

We first investigated the genotypic architecture of schizo- 
phrenia in the MGS study to identify SNP sets without 
knowledge of the subject's clinical status (i.e., case or 
control) (9). Our exhaustive search uncovered 723 non- 
identical and possibly overlapping SNP sets in the MGS 
samples. The SNP sets varied in terms of numbers of both 
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FIGURE 2. Examples of Identified Single-Nudeotide Polymorphism (SNP) Sets Represented as Heat Map Submatrices and 
their Corresponding Risk 3 




a Allele values are indicated as BB (dark blue), AB (intermediate blue), AA (light blue), and missing (black). Subject status (i.e., cases and controls) 
was superimposed after SNP set identification: cases in red and controls in green. Genotypic SNP sets are labeled by a pair of numbers 
representing the maximum number of clusters and the order in which they were selected by the method. All SNP sets are calculated with the 
generalized factorization method based on the non-negative matrix factorization method (see the Method section in the online data 
supplement). Dendrograms were artificially superimposed for visualization purposes. (See Figure S3 in the data supplement for all SNP sets at 
more than 70% of risk.) Panels A-F illustrate SNP sets, representing submatrices of the original genome-wide association study matrix and 
composed of shared SNPs and/or subjects. Panel A presents a SNP set exhibiting a homogeneous configuration in which all subjects in that group 
share the same interaction among a specific set of homozygotic alleles (i.e., SNP X ... X SNP interactions). Panel B presents a SNP set encoding 
subjects exhibiting a particular heterozygotic genotype with respect to the A allele in a subset of SNPs and another heterozygote genotype with 
respect to the B allele in a different subset of SNPs (i.e., AND-type of interactions). Panel C presents a SNP set composed of subjects who share 
a particular genotype value for a subset of SNPs, and another subset of subjects sharing a different genotype value for the same subset of SNPs 
(i.e., OR-type of interactions). Inclusion-type relations are exemplified by a SNP set (panel A) subsumed under a more general SNP set (panel C), 
and both sets provide different descriptions of target subjects. Panels D-F present SNP sets that combine all previous interactions into more 
complex structures. Panel G presents a surface representing the risk function of the uncovered SNP sets. The risk (z-axis; red=high, blue=low) was 
calculated based on the distribution subject status (i.e., cases and controls) within each SNP set, and the surface was plotted interpolating the 
relation domains. Dendrograms reflect the order adopted for plotting SNP sets. SNP sets were clustered by shared SNP (x-axis) and by shared 
subjects (y-axis) using hypergeometric statistics (see the Method section in the data supplement). (Close- located SNP sets in an edge share more 
SNPs and/or subjects than those located far away.) 



subjects and SNPs. For example, one group contains 70 
subjects and 24 SNPs, as expected because few subjects 
can share a large number of SNPs. Conversely, another 
group contains 258 subjects and three SNPs, as expected 
because a large number of subjects are likely to share only 
a few SNPs. Initially, we retained a large number of SNP 



sets merely to identify the genotypic clusters in all subjects 
whether they had schizophrenia or not. 

SNP Sets Vary Greatly in Risk for Schizophrenia 

Second, we computed the risk for schizophrenia in 
carriers of each SNP set (33) (Figure 2A-F; see also Figure 
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TABLE 1. Single-Nucleotide Polymorphism (SNP) Sets Reported With ^70% Risk of Schizophrenia, Statistical Comparison With 
Individual SNPs, and Composition 3 



SKAT p Values 



SNP set 


Group 
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Worst SNP 
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SNPs (Nl 


Risk (%) 


19_2 


2.88E-05 


3.43E-02 


4.60E-04 


1.38E-02 


9 


9 


100 


88_64 


1.43E-11 


2.06E-03 


2.15E-07 


1.79E-02 


176 


6 


96 


81_13 


1.46E-10 


5.44E-03 


2.15E-07 


3.70E-02 


234 


10 


95 


87_76 


7.11E-07 


1.05E-02 


1.37E-05 


3.13E-02 


74 


3 


95 


58_29 


5.41 E-04 


6.52E-03 


2.07E-04 


2.83 E-02 


125 


6 


94 


83_41 


3.87E-05 


1.56E-04 


1.01 E-04 


2.68E-04 


61 


4 


93 


9_9 


1.51 E-06 


2.52E-03 


1.23 E-04 


1.18E-02 


144 


19 


92 


10_4 


3.83E-05 


1.72E-02 


2.11 E-04 


1.05E-02 


58 


11 


91 


14_6 


2.38E-06 


1.85E-03 


1.23 E-04 


5.87E-03 


22 


11 


90 


56_30 


1.91 E-10 


4.33E-03 


2.15E-07 


2.10E-02 


382 


11 


88 


42_37 


4.15E-06 


2.35E-02 


6.59E-05 


1.38E-02 


70 


24 


86 


65_25 


3.95E-05 


1.99E-02 


2.53E-04 


8.83 E-02 


62 


5 


86 


71_55 


1.90E-05 


3.99E-04 


2.63 E-05 


1.08E-03 


63 


6 


86 


12_11 


6.53E-04 


2.28E-02 


7.34 E-03 


1.05E-01 


94 


11 


84 


90_78 


7.87E-04 


2.99E-02 


3.58E-02 


9.53E-02 


200 


4 


83 


77_5 


4.86E-05 


5.01 E-04 


2.08E-05 


1.49E-03 


297 


5 


82 


88_8 


2.88E-04 


2.95E-02 


3.58E-02 


8.36E-02 


32 


10 


82 


51_28 


2.07E-04 


2.25E-02 


1.75E-02 


3.13E-02 


258 


3 


81 


59_48 


2.32E-09 


9.48E-03 


2.38E-05 


2.96E-02 


174 


7 


80 


41_12 


1.36E-03 


1.62E-02 


1.12E-01 


2.17E-02 


78 


3 


76 


22_11 


6.24E-05 


4.29E-04 


1.33 E-04 


1.08E-03 


97 


12 


75 


13_12 


4.52E-05 


3.61 E-04 


5.88E-05 


1.45E-03 


148 


10 


75 


31_22 


1.01 E-04 


2.37E-04 


1.11 E-04 


4.03 E-04 


92 


7 


74 


85_84 


1.53E-05 


1.01 E-04 


1.37E-05 


1.81 E-04 


39 


4 


74 


87_84 


1.19E-04 


1.40E-02 


1.37E-05 


1.30E-02 


22 


13 


74 


16_10 


1.81 E-03 


1.59E-02 


2.92E-03 


5.92E-02 


141 


12 


73 


56_19 


2.02E-04 


6.69E-04 


1.02 E-04 


1.76E-03 


90 


5 


73 


75_31 


2.61 E-05 


1.37E-02 


1.02 E-04 


9.53E-02 


197 


8 


73 


81_73 


1.13E-05 


2.99E-02 


2.57E-04 


1.29E-02 


213 


10 


73 


85_23 


6.20E-03 


9.46E-03 


5.58E-03 


1.16E-02 


53 


4 


73 


21_8 


6.24E-05 


4.29E-04 


1.33 E-04 


1.08E-03 


188 


12 


71 


76_74 


1.58E-17 


1.33E-02 


1.12E-05 


1.17E-02 


284 


14 


71 


61_39 


1.04E-03 


2.43E-02 


1.90E-03 


5.45E-02 


51 


3 


71 


75_67 


3.76E-18 


7.16E-02 


2.15E-07 


1.00E-03 


877 


32 


71 


76_63 


2.07E-02 


2.25E-02 


1.75E-02 


3.13E-02 


34 


3 


71 


81 _3 


6.24E-05 


4.29E-04 


1.33 E-04 


1.08E-03 


107 


12 


71 


87_26 


2.49E-03 


6.03 E-03 


4.14E-03 


1.12E-02 


28 


5 


71 


88_43 


1.37E-04 


1.85E-03 


6.03 E-04 


4.82E-03 


70 


7 


71 


25_10 


3.49E-06 


1.67E-03 


1.11 E-04 


1.53E-02 


124 


9 


70 


12_2 


1.81 E-03 


1.59E-02 


2.92 E-04 


5.92E-02 


194 


12 


70 


52_42 


5.70E-05 


5.06E-03 


6.59E-05 


3.60E-02 


87 


16 


70 


54_51 


1.49E-05 


5.01 E-04 


2.08E-04 


1.49E-03 


132 


5 


70 



3 SKAT=SNP-Set Kernel Association Test. 



S3 in the online data supplement). The risk of schizo- 
phrenia was normally distributed, as expected when 
capturing the full range of variability. Ninety- eight of 
the 723 SNP sets had a risk of schizophrenia greater than 
66% and accounted for 90% of schizophrenia cases in 
the MGS study. Forty-two SNP sets had a risk of 
schizophrenia >70% (Table 1; see also Figure S4 in the 
data supplement). For example, SNP set 19_2 had a risk 
of 100%, meaning that all carriers were schizophrenia 
cases. The ability of SNP sets to predict schizophrenia 



risk is illustrated in Figure 2G. SKAT showed that the 
association of schizophrenia with particular SNP sets 
was stronger than with the average effects of their 
constituent SNPs (Table 1). For example, the SNP set 
81_13 has a p value of 1.46X 10~ 10 , whereas the best and 
average SNPs within this set have p values of2.15XlO~ 7 
and 5.44X 10" 3 , respectively. SKAT and PLINK (34) methods 
estimated similar p values for the individual SNPs (R 2 =0.99; 
p values for F statistics, <3.8X10" 46 ), showing that SKAT 
does not inflate results. 
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RISK ARCHITECTURE OF THE SCHIZOPHRENIAS 



FIGURE 3. Dissection of a Genome-Wide Association Study (GWAS) and Identification of the Genotypic and Phenotypic 
Architecture of Schizophrenia 3 





a Panel A presents a genotypic network, in which nodes indicate SNP sets linked by shared SNPs (blue lines) and/or subjects (red lines). The risk 
value, which was incorporated after the SNP set identification, was color-coded. The 42 SNP sets harboring >70% of risk were topological^ 
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The global variance in liability to schizophrenia ex- 
plained by the average effects of all SNPs simultaneously 
(8, 35) in our sample was 24%. While individual SNPs were 
mostly low penetrant, many high-risk SNP sets were highly 
penetrant (e.g., 100% to 70%; see Table 1) and much more 
informative in predicting schizophrenia risk. 

Relations Among SNP Sets to One Another and to 
Gene Products 

We hypothesized that schizophrenia may be an etiolog- 
ically heterogeneous group of illnesses in which some 
genotypic networks are disjoint, that is, share neither SNPs 
nor subjects. To test this, we first checked for overlap in 
constituent SNPs and/ or subjects among all the SNP sets at 
high risk for schizophrenia (see Figure S5 in the online data 
supplement). We found that 17 genotypic networks were 
disjoint, sharing neither SNPs nor subjects (Figure 3A), 
suggesting that these are distinct antecedents of schizo- 
phrenia. These networks vary in size and complexity: 
one highly connected network associates 11 SNP sets, 
whereas eight networks are composed of only a single 
isolated SNP set. 

We also determined that some SNP sets share SNPs but 
not subjects (e.g., 59_48 and 87_76; Figure 3A), as expected 
because they involve the same SNPs but with different allele 
values (both alleles of a SNP can act as risk alleles in 
different genetic contexts). In contrast, we found that the 
58_29 and 41_12 SNP sets do not share SNPs, but inde- 
pendently specify almost the same individuals (Figure 3A), 
as expected when, for example, distinct subsets of genotypic 
features influence a common developmental pathway. 
Finally, some SNP sets overlap in both SNPs and subjects, 
suggesting that one is a subset within the other (e.g., 88_64 
and 81_13; see Figure S3A,C in the online data supplement). 
Therefore, the genotypic networks display distinct topolo- 
gies differing in the way constituent SNPs and subjects are 
related. 

When evaluating whether different genotypic net- 
works operate through distinct mechanisms, we found 
that high-risk SNP sets mapped to various classes of 
genes (e.g., protein coding, ncRNA genes, and pseudo- 
genes) related to known functions and causing different 
effects on their products (Figure 3A; see also Tables 
S1-S3 and Figure S6 in the online data supplement). We 
identified distinct pathways as exemplified in Table 2. 



Notably, all of these pathways are interconnected by the 
overlapping gene products that include genes pre- 
viously associated with schizophrenia by GWAS, as well 
as genes known to be abnormally expressed in the 
brains of schizophrenia patients (see Table S4, Figure 
S7, and the Pathways section in the data supplement). 
The emerging picture is suggestive of a possible path- 
ophysiology in which abnormal brain development 
interacts with environmental events triggering abnor- 
mal or exaggerated immune and oxidative processes 
that increase risk of schizophrenia. 

Complex Genotypic-Phenotypic Relationships in 
Schizophrenia 

Next we examined whether the complex genetic archi- 
tecture of schizophrenia leads to phenotypic heterogene- 
ity. Using data from the Diagnostic Interview for Genetic 
Studies (30), as well as from the Best Estimate Diagnosis 
Code Sheet submitted by GAIN /non- GAIN to dbGaP (see 
Appendix I, Figures SI and S2, and the Method section in 
the online data supplement), we originally identified 342 
nonidentical and possibly overlapping phenotypic sets of 
distinct clinical features that cluster in particular cases 
with schizophrenia (i.e., phenotypic sets or clinical syn- 
dromes) without regard for their genetic background. 
Different SNP sets were significantly associated with 
particular clinical syndromes (hypergeometric statistics, 
p values from 2X10 -13 to 1X10 -3 ). However, the genotypic- 
phenotypic relations were complex (i.e., many- to-many [29]): 
the same genotypic network could be associated with 
multiple clinical outcomes (i.e., multifinality or pleiotropy) 
and different genotypic networks could lead to the same 
clinical outcome (i.e., equifinality or heterogeneity; Table 3; 
see also Table S5 in the data supplement). The genotypic - 
phenotypic relations were highly significant by a permuta- 
tion test (empirical p value <4.7X10~ 3 ; Table 3; see also 
Table S5). 

Specifically, we identified a phenotypic set indicating 
a general process of severe deterioration (i.e., continuous 
positive symptoms with marked and progressive impair- 
ment) that was associated with many SNP sets (e.g., SNP 
sets 75_67 and 56_30, with p values <2.3X10" 13 and 
2.55X10" 5 , respectively; Table 3, Figure 3A). Other SNP 
sets were associated with a general process of moderate 
deterioration (moderate or fluctuating impairment despite 



organized into 17 disjoint subnetworks. Subsets of implicated genes are indicated. Highly connected SNP sets based on shared SNPs (blue lines) 
and subjects (red lines) might share a phenotypic profile (e.g., 81_13 and 88_64; see Table 3). Yet a super-SNP set, such as 81_13, may have 
unique — in addition to common — descriptive phenotypic features (see Table 3). Disconnected SNP sets, such as 71_55 and 14_6, belong to 
disjoint networks that may include the same gene (i.e., NTKR3; see Table S1 and Figure S6B in the online data supplement) but carry SNPs that 
are located in the promoter and coding region, respectively. Both SNPs may produce distinct molecular consequences (see Table S3 and Figure 
S6B in the data supplement) and phenotypic profiles (see Table 3). Panel B shows the classes of schizophrenia mapped to the disease architecture 
(see Table 3). Eight classes of schizophrenia were identified by independently characterizing each phenotypic feature included in a genotypic- 
phenotypic relationship; classifying each item based on the symptoms as purely positive, purely negative, primarily positive, or primarily 
negative symptoms; and clustering these relationships based on their recoded phenotypic domain using non-negative matrix factorization. SNP 
sets harboring only positive symptoms are indicated in red, whereas those displaying negative symptoms are in green. Intermediate 
combinations including severe and/or moderate processes combined with positive and/or negative and/or disorganized symptoms were also 
color-coded. Dashed lines indicate nonsignificant matching. 
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a continuous mixture of symptoms), as in SNP sets 14_6, 
and 42_37 (p values <5X10" 4 ; Table 3, Figure 3A). 

We identified specific clinical syndromes that were 
unambiguously associated with particular genotypic net- 
works. For example, specific phenotypic sets differentiate 
among SNP sets even within the same network, which 
illustrate similar but not identical forms of multifinality in 
schizophrenia (e.g., 76_74 and 58_29; Table 3, Figure 3A, 
blue lines). Particular phenotype sets can also distinguish 
SNP sets connected only by shared subjects (Figure 3A, red 
lines). For example, SNP set 76_74 shares subjects with 
56_30 and with 81_13; however, the latter SNP sets are 
associated with a specific phenotypic set not present in 
76_74 (Table 3). 

Positive and Negative Symptoms Differentiate 
Classes of Schizophrenia 

Genotypic and phenotypic relationships could be 
grouped into eight classes of schizophrenia, as shown in 
Figure 3B and Table 3 (31, 32, 36). First, we identified SNP 
sets involving subjects with predominantly positive symp- 
toms (e.g., 41_12 and 88_64) and few residual symptoms. 
Second, we identified SNP sets represented by predomi- 
nantly negative and disorganized symptoms (e.g., 10_4 
and 61_39), decreased psychosocial function, and contin- 
uous residual symptoms. As discussed in the online data 
supplement (see the Replicability of the Phenotypic 
Features section), bizarre delusions and symptoms of 
cognitive and behavioral disorganization, such as thought 
insertion and disorganized speech among others, were 
accepted as fuzzy indicators of either positive or negative 
classes of schizophrenia but were considered to be more 
common in negative and disorganized classes (e.g., in 
Table 3, thought echo and commenting hallucinations in 
"negative schizophrenia" with phenotypic set 46_29 as- 
sociated with SNP set 14_6). 

Third, several SNP sets harbor mixed positive and 
negative symptoms (e.g., 59_48 and 54_51). These three 
classes were enriched by considering the general severe 
and moderate patterns, which were frequent in several 
networks (Figure 3B), as described above. Because the 
latter patterns appear in combination with a set of only 
positive symptoms (e.g., 81_13), both positive and negative 
symptoms (e.g., 75_67), and only negative symptoms 
(e.g., 19_2), we were able to classify schizophrenia 
into eight classes (Figure 3B). A principal-components 
analysis of the phenotypic features in the Diagnostic 
Interview for Genetic Studies confirmed this classifica- 
tion (see Table S6 and the Method section in the online 
data supplement). 

Replication of Results in Two Independent Samples 

We tested the replicability of our findings in the MGS 
study by carrying out the same analyses of the genotypic 
and phenotypic architecture of schizophrenia in the 
CATIE (19, 22, 23) and Portuguese Island (19, 21) samples. 



A total of 1,303 SNPs were shared between the selected 
SNPs in the MGS (see the Data Cleaning section in the 
online data supplement) and CATIE samples, and 1,234 
SNPs between the MGS and Portuguese Island samples. 
Imputed variants were not considered, to avoid possible 
biases. 

We found that 31 and 30 of the 42 SNP sets selected in 
the MGS sample were also identified in the CATIE and 
Portuguese Island samples, respectively (see Tables S7 and 
S8 in the online data supplement). Together, both samples 
reproduced at least 81% of the SNP sets at risk (see Table S9 
in the data supplement). In addition, most of the SNP sets 
replicated in the two PGC samples achieved risk values as 
high as those of the MGS sample (>70%) (see Table S8): 
70% of those identified exhibit >70% risk, and 90% show 
>60% risk. Some SNP sets exhibited slightly higher risk 
values than those in the MGS sample. 

The genotypic-phenotypic relations in CATIE and the 
Portuguese Island studies closely matched those observed 
in the MGS study (hypergeometric statistics, p values 
1X10" 7 to 1X10" 2 ; see Tables S7 and S8 and the Rep- 
licability section in the data supplement). The eight 
schizophrenia classes exhibited high reproducibility. For 
example, except for one relation ("-" in the MGS study and 
"+ and -" in CATIE; see Table S9 in the data supplement), 
all relations exhibited similar positive and negative 
symptoms in the MGS study and CATIE. Three relations 
showed less specific symptoms in CATIE than in the MGS 
study, as expected because CATIE did not use the Di- 
agnostic Interview for Genetic Studies (see Table S10 and 
the Replicability section in the data supplement). 

We found few differences when comparing the MGS and 
Portuguese Island studies (see Table S9 in the data 
supplement), except differences in severity that preserved 
the sign of the symptoms. Three relations with negative 
symptoms in the MGS study exhibited negative and pos- 
itive symptoms in the Portuguese Island sample (see Table 
S9). Only two SNP sets in the Portuguese Island sample 
had no significant cross-match with the phenotypic features 
expected from the MGS study. 

Discussion 

Our findings indicate that schizophrenia comprises 
several distinct clinical syndromes associated with many 
disjoint genotypic networks. Consequently, much of the 
heritability of schizophrenia has not been detected by ap- 
proaches that classify subjects only according to whether 
or not they have schizophrenia. Our purely data-driven 
analysis shows that the elusive heritability of schizophre- 
nia is not missing, but is encoded in a complex distribution 
of genotypic-phenotypic relationships. 

We found that 42 interactive SNP sets had greater than 
70% risk of schizophrenia. The interactive SNP sets ex- 
plained the risk more fully than the average effects of all 
SNPs simultaneously and were more strongly related to 
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TABLE 2. Examples of Products of Genes Uncovered by the SNP Sets in Interconnected Signaling Pathways 3 



Signaling Pathways/Function 


Genes 


SNP Sets 


Symptoms 


Neural development 


DKK4, STKY1, VANGL1 


75_67 


Severe process, + and - 




NCAM1 


42_37 


Moderate process, + and - 






52_42 


Moderate process, - 




CHST9 


81_73 






EML5 


13_12 






SEM3A 


9_9 


Moderate process, - 


Neurotrophin function 


NTRK3 


75_67 


Severe process, + and - 




Upstream region 


71_55 


+ and - 




SNTG1 


81_13 


Severe process, + 




MAGEH1 


25_10 


Severe process, + 


Neurotransmission 


NET02 


76_74, 75_67 


Severe process, with + and - 




0PN5 


31_22 


+ 




NALCN 


87_26 


Moderate process, continuous + 


Neuronal function and neurodegenerative disorders 


SPATA7, ZC3H14 


13_12 






SLC20A2 


41_12 


+ 



a The 42 SNP sets at high risk for schizophrenia involved at least 96 gene loci, including 54 protein-coding loci and 42 polymorphisms at 
regulatory sites, as well as 112 polymorphisms in either intergenic or unannotated regions (see full Tables S1 and S4 and Figure S7 in the 
online data supplement). 



their particular syndromes of schizophrenia than are their 
individual SNPs (Table 1). Consequently, identifying the 
organization of SNPs into interactive SNP sets enabled us 
to increase the power to detect associations: 98 SNP sets 
with greater than 66% risk accounted for 90% of cases. The 
constituent genes in these networks belong to signaling 
pathways highly associated with schizophrenia (see Figure 
S7 in the online data supplement). Our findings have 
broad implications, so we will consider their strengths and 
limitations carefully. 

Strengths and Limitations 

Two particular features of our methods merit consider- 
ation in terms of their strengths and limitations. First, we 
concurrently used detailed assessments of both the geno- 
type and the phenotype to identify their associations, 
thereby combining genomic and phenomic information 
(29). Other approaches decrease the number of variables 
before analysis ("data reduction"), even if the biological 
importance of these variables is not known a priori. The 
evidence we have that schizophrenia is a heterogeneous 
group of disorders suggests that reducing clinical infor- 
mation about schizophrenia to a single categorical diag- 
nosis is inadequate. 

Despite the detailed phenotypic information we had 
available about subjects, there are still limitations to data 
obtained even from reliable structured interviews like the 
Diagnostic Interview for Genetic Studies. Interview data 
are based on self-reports that are interpreted and coded 
by interviewers. Subjects may not be willing or able to 
report their symptoms accurately. We had obtained 
information from treatment records and family history 
reports, but we chose not to use such additional in- 
formation, except for the resulting best- estimate final 
DSM ratings of diagnosis, because its extent and quality 



varied in unmeasured ways between cases. The greatest 
limitation in the phenotypic assessments in available 
GWAS databases has been the overreliance on subjective 
symptoms with an absence of objective measurements, 
such as cognitive tests, brain electrophysiology, and 
neuroimaging (37). Subjective symptoms are fuzzy in- 
dicators of the underlying pathophysiology. Objective 
measures could complement the assessment of symp- 
toms and could be applied to both cases and controls, 
thereby providing a more comprehensive and valid char- 
acterization of the phenotype of all subjects. The biggest 
challenge in GWAS is access to studies with rich phe- 
notypic data about both subjective and objective meas- 
ures obtained systematically from all subjects. 

Our finding of robust replicability based on detailed 
symptom profiles from interview data alone do have im- 
portant implications for the size of samples and the scope 
of phenotypic assessments in genomic studies of complex 
disorders. We obtained robust replication of results in 
moderate size samples, such as 738 cases in CATIE and 346 
cases in the Portuguese Island study, which shows that it 
is incorrect to assume that extremely large samples are 
needed to obtain robust and replicable findings. Difficulty 
in replication in previous work can be better explained by 
the neglect of the complexity of genetic and phenotypic 
architecture rather than by moderate sample size. We 
identified more information by combining rich data about 
the complex architecture of genotypes and interview-based 
symptoms in such moderate size samples than has been 
obtained in analysis of much larger compilations of multiple 
samples that relied on additive gene effects on categorical 
diagnosis (8). 

Nearly all previous genetic association studies have relied 
on patient interviews for clinical description, as detailed 
objective testing has been impractical in large samples 
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TABLE 3* Subset of Genotypic-Phenotypic AND/OR Relationships (Hypergeometric Statistics) 3 



Schizophrenia Class, Symptoms' 3 , and DSM Ratings 


Phenotypic Sets 


SNP Sets 


P 


Severe process, with positive and negative symptom schizophrenia 


Positive symptoms; moderate severity of impairment; unable to 


15_13 


56_30 


2.55E-05 


function since onset 








Auditory hallucinations (2 or more voices; running commentaries) 


12_11 




1.79E-04 


Auditory hallucinations (2 or more voices; running commentaries); 


21 _1 




3.66E-04 


thought echoing; withdrawal; insertion and broadcasting; delusions 








of mind reading 








Hallucinations (any); auditory hallucinations (ever; 2 or more voices); 
grossly disorganized behavior 


50_46 




5.70E-04 


Hallucinations (mood incongruent); auditory hallucinations; somatic 


9_6 




4.45E-03 


hallucinations (olfactory; gustatory; tactile); religious delusions; 
delusions of mind reading; delusions of control; thought echoing; 








withdrawal; insertion and broadcasting 








Hallucinations (mood incongruent); persecutory delusions; delusions 
of reference; jealousy delusions; bizarre delusions; disorganized odd 
behavior; disorganized odd speech; delusions, fragmented 


46_23 




4.15E-03 


(unrptatpH thpmpsV HpIi isinns wiHpsnrppiH (intniHp intn mnst 

^ L 1 1 II V 1 C 1 L ^ L 1 LI 1 V-l 1 1 K*DJ , UV^IU JIUI 1 Jj VV 1 V_ J 1 V_ CI v-fl \ 1 1 1 L 1 L 1 V^t V II 1 L V 7 1 1 IWjL 

aspects of life); thought insertion; flat affect; avolition and apathy 








C nntini mi nn<;iti\/p wmntnmv <;p\/prp imnpiirmpnt* rnntini ini i<; 

V_U 1 1 LM IUUU Jl y L/U3I LIVC oyillkJLvJIIIO, jCVCI C MHL/dlllllvZIIL, V_vJ 1 1 LI 1 1 IA \J LI 3 


15 13 


75 67 


2.31 E-13 


course; no affective symptoms 








Grossly disorganized behavior; severe impairment; continuous course 


54 11 




4.90E-06 


Dphi^inrK of npr<;pri itinn pinrl rpfprpnrp* HKnrcr;ini7pH ^npprh* <;p\/prp 


30 17 




2.56E-04 


impairment; unable to function since onset 








AuHitorv h^ 1 1 1 iri npitinn^ fpvpr* ~) or mnrp voirp^* ninninp 

rVU \J 1 LU 1 y 1 IC4IIUv.ll 1 CI LIUI 1 j IC Vtl j Z- vv 1 1 IIUI L VUI LtJj 1 U 1 1 1 1 1 1 1 g 


18 13 




3.50E-04 


commentaries); jealousy delusions 








Thought insertion and withdrawal 


27 6 




3.62E-03 


ndMUv-ll IdLIUI lb \a\\y), dUUILUly 1 IdllULII ldllUI lb \A Ul IIIUIC VUlLCbJ, glUbbly 

disorganized behavior 


jU_tD 




J.D I c Uj 


Delusions, persecutory and reference; delusions, widespread (intrude 


61_18 




4.28E-03 


into most aspects of life) 








Disorganized; odd speech 


64_11 




1.45E-03 


Delusions, widespread (intrude into most aspects of life); continuous 


65_64 




1.21 E-03 


course 








Continuously positive symptoms; severe impairment; unable to 


15_13 


76_74 


1.07E-07 


function since onset; no affective symptoms 








Delusions, widespread (intrude into most aspects of life) 


65_64 




1.47E-03 


Positive and negative schizophrenia 


Auditory hallucinations; delusions (any); bizarre delusions; 


12_4 


59_48 


1.88E-04 


disorganized speech and behavior; flat affect; alogia; avolition 








Auditory hallucinations (2 or more voices; running commentaries) 


42_9 


71_55 


1.98E-03 


Negative schizophrenia 


Thought insertion and withdrawal 


52_28 


58_29 


1.44E-04 


Disorganized speech; odd speech 


7_3 


9_9 


1.97E-04 


Flat affect; persecutory delusions 


48_41 




2.23E-03 


Delusions of mind reading; guilt delusions; sin delusions; jealousy 
delusions 


26_8 




4.20E-03 


Flat affect; apathy; avolition 


69_41 


22_11 


5.52E-05 


Flat affect; apathy; avolition; alogia; continuous mixture of positive 


10_5 




4.62E-04 


and negative symptoms 








Disorganized and odd speech 


17_2 




1.01 E-04 


Positive schizophrenia 


Hallucinations (any); auditory hallucinations (ever; 2 or more voices); 


63_24 


88_64 


3.45E-04 


no affective symptoms 








Delusions of jealousy; auditory hallucinations (running commentaries) 


69_66 




4.49E-03 


Severe process, positive schizophrenia 


Continuously positive symptoms; severe impairment; unable to 
function since onset; no affective symptoms 


22_13 


77_5 


5.66E-05 


Auditory hallucinations (2 or more voices; running commentaries) 


18_13 




3.25E-03 


Hallucinations (any); auditory hallucinations (2 or more voices; 


53_6 




4.76E-03 


running commentaries); continuous course 








Auditory hallucinations (ever; voices; noises; music) 


59_41 




1.22 E-03 



continued 
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TABLE 3. Subset of Genotypic-Phenotypic AND/OR Relationships (Hypergeometric Statistics) 3 (continued) 



Schizophrenia Class, Symptoms' 3 , and DSM Ratings 


Phenotypic Sets 


SNP Sets 


P 


Severe process, positive schizophrenia 


Continuously positive symptoms; severe impairment; unable to 
function since onset; no affective symptoms 


20 19 


81 13 


2.83 E-04 


Hallucinations (any); auditory hallucinations (ever; 2 or more voices); 

hi7^rrp Hplusinnv Hplusinns fr^pmpntpH funrpl^tpH thpmpsV 

delusions, widespread (intrude into most aspects of life) 


55_7 




8.57E-04 


Delusions of reference; delusions of persecution 


34_17 




2.40E-03 


A i iH itnrv h z\ 1 1 1 \c\ n z\\ in nc fninnino rnmmpnt^ripcl' \c\\ icv H 1 1 Kinnc 
AUUILUly 1 Id 1 1 U LI 1 Id LIU 1 lb ^lUlllllllg l_U 1 1 1 1 1 1 CI 1 Ld 1 1 Cby , JCdlUUby UCIUblUllb 


69 66 




1 .30E— 03 


Severe impairment; unable to function since onset; no affective 


27_7 


25_10 


4.76E-06 


symptoms 








Auditory hallucinations (2 or more voices; running commentaries) 


18_13 




9.50E-05 


Auditory hallucinations (ever; voices; noises; music); auditory 


4_1 




2.49E-03 


hallucinations (2 or more voices; running commentaries); thought 
echoing 








Delusions of reference; delusions of persecution 


66_54 




2.10E-03 


Bizarre delusions; delusions of mind reading; delusions, widespread 
(intrude into most aspects of life) 


8_4 




1.93E-03 


Moderate process, disorganized negative schizophrenia 


Grossly disorganized or catatonic behavior; disorganized speech 


51_38 


19_2 


4.03E-04 


Moderate deterioration; unable to function since onset; no affective 


42_7 


14_6 


4.96E-04 


symptoms 








Grossly disorganized and inappropriate behavior 


18_3 




2.55E-03 


Auditory hallucinations (running commentaries); thought echoing 


46_29 




3.78E-03 


Moderate process, positive and negative schizophrenia 


Hallucinations (any); auditory hallucinations (ever; voices; noises; 


5_2 


42_37 


1.32E-04 


music); continuous mixture of positive and negative symptoms; 
continuous course; moderate impairment; unable to function since 








onset; no affective symptoms 








Bizarre delusions; delusions of reference 


57_39 




4.70E-03 


Continuous mixture of positive and negative symptoms; continuous 


11_5 


88_43 


6.88E-04 


course; moderate impairment; unable to function since onset; no 








affective symptoms 








Auditory hallucinations (ever); bizarre delusions; delusions, 
fragmented (unrelated to theme) 


24_4 


51_28 


9.58E-04 


Moderate process, continuous positive schizophrenia 


No affective symptoms 


48_7 


16_10 


1.44E-03 


Continuously positive symptoms; severe impairment; unable to 
function since onset; no affective symptoms 


28_23 


83_41 


3.48E-03 


Continuously positive symptoms; no affective symptoms 


25_20 


87_26 


4.22E-03 



a See Appendix I and full Table S5 in the online data supplement. 

b Symptoms were assessed with the Diagnostic Interview for Genetic Studies. 



because of cost and the difficult logistics of securing 
cooperation with time-consuming test batteries. Now that 
we have shown that replicable results can be obtained in 
moderate size samples, it is feasible to complement inter- 
view data with more objective and thorough assessments. 
Fundamental research into the causes and characteristics 
of the schizophrenias is likely to require phenotypic as- 
sessment beyond the clinical features needed for clinical 
diagnosis and treatment, which has given primacy to signs 
and symptoms assessable by interview alone (37). 

Second, we have strived to extract the maximum in- 
formation available in a single GWAS without making 
restrictive a priori assumptions. In other words, we pro- 
ceeded in a data-driven, model-free manner. As a conse- 
quence, whatever information emerged from the data 



mining process (such as the different classes of schizo- 
phrenia) is inherent to the data and was not artificially 
imposed by either an a priori model or previous knowledge 
of the data (such as the "case or control" status of the 
subjects). Nevertheless, our initial pool of 2,891 SNPs, 
preselected for at least loose association with schizophre- 
nia in the MGS study (9), might be missing additional risk 
SNPs that would eventually show up in an even more 
exhaustive genomic analysis. 

Our findings about the heterogeneity and complexity of 
schizophrenia (31, 32, 36) require a careful reconsideration 
of the concept of "replicability." In order to be meaningful 
in complex disorders like schizophrenia, efforts to replicate 
findings must take into account the distributed heritability 
and developmental complexity of the disease. 
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Patient Perspectives 



A patient with severe process, positive schizophrenia 
(associated with SNP set 81_13 and phenotypic sets 
20_19 and 34_17) 

"Ms. A" was a 23-year-old woman with DSM-IV schizo- 
phrenia and no history of substance abuse, depression, or 
mania. She was born 2 months premature due to maternal 
preeclampsia. At age 5, she taped the mouths of her dolls to 
try to stop her hallucinations of their calling her name and 
whispering to her. At age 7, she developed delusions of 
persecution and reference (as in phenotypic set 34_17) and 
the voices became louder. At age 9, she was diagnosed with 
paranoid schizophrenia and began treatment with antipsy- 
chotics. Her delusions about her classmate's harmful inten- 
tions provoked fights, so she dropped out of high school. Her 
delusions became widespread but not bizarre. Her halluci- 
nations and delusions never remitted, and she developed no 
negative symptoms, disorganized speech, or behavior. She 
had continuous and progressive deterioration without as- 
sociated affective symptoms, so that she was unable to work 
or marry (i.e., severe process as in phenotypic set 20_19). On 
mental status, she had appropriate behavior, oddly vague 
speech without loose associations, well-modulated affect, 
average intelligence, and poor insight and judgment. She felt 
she was being watched and followed. 

Ms. A's clinical profile was specifically associated with 
the SNP set 81_13, which had a 95% risk of schizophrenia. 
This SNP set is a marker of a functional complex of sev- 
eral genes that may possibly influence brain function by 
regulation of neurodevelopment and neuronal cell signal- 
ing. For example, the gamma-1-syntrophin (SNTG1) gene 
encodes a brain-specific protein with two functional do- 
mains: one regulates alpha-adrenergic receptor signaling, 
and the other mediates dystrophin binding. Dystrophin 
interacts in turn with glycoprotein complexes, and another 
gene associated with 81_13 is glycoprotein-2 (GP2). GP2 is 
associated with risk of neuropathies, basal ganglia disor- 
ders, and schizophrenia. PDXNL encodes a peroxidsin-like 
endonuclease that selectively degrades mRNAs, suggesting 
that the SNP set 81_13 may function normally to maintain 
healthy neurodevelopment, but is associated with schizo- 
phrenia when deficient. 

A patient with moderate process, disorganized 
negative schizophrenia (associated with SNP set 19_2 
and phenotypic set 51_38) 

"Mr. B" was a 23-year-old man with DSM-IV schizophre- 
nia. At age 10, he began to collect odd things from the 
garbage and to speak in a vague, emotionless manner. He 
became childishly negativistic, obstinate, and isolated. By 
age 13, his behavior became more inappropriate and 
disorganized, and his speech was fragmented by frequent 
derailment (as in phenotypic set 51_38). He never had 
hallucinations. Occasionally he thought others were against 
him or making fun of him, but his convictions never lasted 
more than a few days and were not systematized or bizarre. 
He was increasingly unmotivated to initiate or persist in 



goal-directed tasks; he completed high school (with paren- 
tal supervision) and then enrolled in college, but he soon 
dropped out. At age 18, he was depressed and used illicit 
drugs briefly. At age 23, he had been continuously 
psychotic with moderate deterioration since onset. He 
was living with his parents, unmarried, unemployed, and 
considering trying college again. On mental status exami- 
nations from ages 16 to 23, he always had flat affect, 
average intelligence, and poor insight and judgment, 
which were accompanied by disorganized speech and 
behavior at times of perceived stress. 

The phenotypic set of disorganized speech and behavior 
was specifically associated with the SNP set 19_2, which 
carried a 100% risk of schizophrenia. This SNP set is 
a marker of a functional complex of several genes that 
act in concert with the gene GOLGA1 in ways that may 
possibly regulate the development and orchestration of 
cortico-striatal circuits underlying motivated activity, in- 
cluding speech and emotional expression. GOLGA1 en- 
codes a key protein in the signaling pathways that regulate 
glycosylation and the transport of proteins and lipids in the 
Golgi apparatus. GOLGA1 alters splicing and polyadenyla- 
tion in the cerebral cortex in patients with schizophrenia 
compared to others. It acts in concert with many other 
genes related to 19_2. For example, the genes WDR38 and 
SCAI influence signaling pathways for cell migration and 
transcriptional regulation in the basal ganglia, which are 
critical for coordination of speech and emotional expression 
via the prefrontal-striatal-prefrontal loop. GOLGA1 variation 
has been associated with schizophrenia, Parkinson's disease, 
Sjogren's syndrome, and sleep disorders. 

A patient with severe process, positive and negative 
schizophrenia (associated with SNP set 75_67 and 
phenotypic sets 15_13, 30_17, 61_18, and 65_64) 

"Ms. C" was a 35-year-old woman with DSM-IV schizo- 
phrenia and no history of substance abuse, depression, or 
mania. She required an individualized educational program 
for learning disabilities from age 6 on. At age 17, she began 
hearing voices that told her people were out to harm her. 
Her persecutory delusions about classmates led to conflict, 
so she dropped out of high school. Delusions of persecution 
and reference became widespread, and interfered with her 
functioning (as in phenotypic set 61_18). Her delusions were 
continuous (as in phenotypic set 65_64) but not bizarre or 
fragmented. She heard multiple voices talking in a chorus to 
her daily, telling her that people wanted to hurt her. Her 
hallucinations and delusions were accompanied by disorga- 
nized speech (as in phenotypic set 30_17) and prominent 
negative symptoms (flat affect, avolition, alogia). She was 
unable to work or marry and required supervision all her life, 
deteriorating severely over time (i.e., severe process as in 
phenotypic set 15_13). On mental status examination, she 
had childishly rude behavior, flat affect, tangential speech, 
poverty of abstract thinking, and poor insight and judgment 
about her illness and behavior. 

Continued 
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MS. Cs clinical profile of mixed positive and negative 
symptoms with severe deterioration was specifically associated 
with the SNP set 75_67, which carried a 71% risk of schizo- 
phrenia. This SNP set is a marker of a functional complex of 
many genes that act in concert to regulate neurotrophic and 
neuroimmune functions in response to diverse environmental 
challenges. The gene for neurotrophin receptor-3 (NTRK3) 
regulates the production of neurotrophin, which promotes the 



growth and survival of neurons, protecting them against 
apoptosis in response to oxidative stress or glutaminergic 
excitotoxicity. NET02 modulates the plasticity of glutamate 
neurotransmission at kainate receptors. GP2 (shared with 
81_13) transports antigens across cell membranes and mod- 
ulates adaptive immune responses. SNTG1 (shared with 
81_13), STYK1, and VANGL1 play diverse roles in neuronal 
proliferation, differentiation, and survival in response to injury. 



Replication: A Lock and Key Combination of 
Genomics and Phenomics 

Replication is always critical, but it is not usually sought 
within a single large study. Here, internal replicability was 
addressed by resampling techniques (94% support; see the 
online data supplement), where the same SNP sets are 
systematically identified despite the random alteration of 
the parameters of the method (17) and/or the sample (38). 
In addition, our biggest challenge was to identify studies 
with rich phenotypic data for independent external rep- 
lication. In most GWAS, phenotypic data have been of 
"secondary" interest, using a variety of structured or even 
unstructured interviews (14, 18, 19) (see the Replicability 
section in the data supplement). So why not attempt to 
replicate the genotypic architecture alone? The same 
answer applies for any method for validation of associa- 
tions: genetic variants associated with individuals may be, 
and in all likelihood often are, completely unrelated to the 
disease. The only way to make sense of these associations is 
to cross-match genomics with high-resolution phenomics 
(29). One can think of it as a "lock and key" combination (or, 
more precisely, many such combinations), where both 
pieces of information are needed to be able to interpret the 
results with confidence. Note that our approach comple- 
ments meta-analysis (39) and/ or pathway analyses (40), 
focusing the search on the combined genotypic-phenotypic 
architecture. 

Despite the described constraints, we successfully iden- 
tified more than 81% of the genotypic-phenotypic rela- 
tionships previously found in the MGS data set in two 
independent samples. These samples were the only ones 
where both genotypic and detailed phenotypic features 
were available and provided by the researchers. Remark- 
ably, the identification was performed with half of the 
SNPs used in the MGS study, because of the different 
platforms and our conservative preference to avoid ex- 
ternal imputations. The success of our replication efforts 
strongly supports the validity and power resulting from 
combining genomic and phenomic information in asso- 
ciation studies. 

Overall, we believe our approach is a pioneering effort to 
specify complex but manageable patterns of gene -gene 
interaction underlying the polygenic risk of schizophrenia. 
In addition, our results hold promise for the emergence of 
a new era in clinical psychiatry in which person-centered 



treatment of complex disorders can be guided by reliable 
assessments of well-validated clinical syndromes and their 
specific causes. 
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